Skip to content

[Bugfix] align input_token_ids with gpu_block_ids#7574

Open
zccjjj wants to merge 2 commits intoPaddlePaddle:developfrom
zccjjj:bugfix
Open

[Bugfix] align input_token_ids with gpu_block_ids#7574
zccjjj wants to merge 2 commits intoPaddlePaddle:developfrom
zccjjj:bugfix

Conversation

@zccjjj
Copy link
Copy Markdown
Contributor

@zccjjj zccjjj commented Apr 23, 2026

Motivation

用于0423本地镜像构造fd
附带以下改动:
ccbd245
https://github.com/PaddlePaddle/FastDeploy/pull/7455/changes#diff-87df1cd74b84894a10d9008fa9d93f7f0a3161355b7edcef771e2b32a9c57909

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 23, 2026

Thanks for your contribution!

@zccjjj zccjjj changed the title bugfix [Bugfix] align input_token_ids with gpu_block_ids Apr 23, 2026
PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5e87930). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...transfer_factory/mooncake_store/attention_store.py 28.57% 5 Missing ⚠️
fastdeploy/cache_manager/prefix_cache_manager.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7574   +/-   ##
==========================================
  Coverage           ?   72.23%           
==========================================
  Files              ?      419           
  Lines              ?    57853           
  Branches           ?     9076           
==========================================
  Hits               ?    41788           
  Misses             ?    13212           
  Partials           ?     2853           
Flag Coverage Δ
GPU 72.23% <33.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-24 21:29:26

📋 Review 摘要

PR 概述:修复 write_cache_to_storageinput_token_idsgpu_block_ids 长度不对齐的问题,并为 AttentionStore 增加 splitwise_role 支持及环境变量覆盖配置。
变更范围cache_manager/prefix_cache_manager.pycache_manager/transfer_factory/mooncake_store/attention_store.pycache_manager/cache_transfer_manager.py
影响面 Tag[KVCache]


📝 PR 规范检查

  1. 标题 Tag 大小写有误:当前使用 [Bugfix],官方 Tag 列表中为 [BugFix](大写 F),建议修正:

    • 标题建议(可直接复制):[BugFix] align input_token_ids with gpu_block_ids
  2. 描述未填写MotivationModifications 两个必填章节均为空(仅保留了模板注释),无法了解此 Bug 的触发条件和具体修改内容。建议参考如下内容填写:

## Motivation
在调用 `write_cache_to_storage` 时,`input_token_ids` 的长度未与截断后的 `gpu_block_ids`(`request.block_tables[:len(keys)]`)对齐,导致写入存储的 token id 数组与实际 block 数量不匹配。

## Modifications
1. `prefix_cache_manager.py`:移除 `enable_output_caching` 条件分支,统一拼接 output_token_ids,并在写存储前将 `input_token_ids` 截断至 `len(keys) * block_size`,确保与 `gpu_block_ids` 对齐。
2. `attention_store.py`:新增 `splitwise_role` 配置字段,支持通过环境变量 `AS_NAMESPACE`/`AS_POD_NAME`/`AS_MODEL_VERSION`/`ENABLE_EP_DP_IN_FD` 覆盖初始化配置,以适配 splitwise 场景下的 pod 命名。
3. `cache_transfer_manager.py`:`_init_storage` 透传 `splitwise_role` 参数。

问题

级别 文件 概述
❓ 疑问 prefix_cache_manager.py:1123 移除 enable_output_caching 条件后,output_token_ids 会始终被追加,需确认在 enable_output_caching=False 场景下不会污染 prefix cache
🟡 建议 attention_store.py:69 int(os.getenv("ENABLE_EP_DP_IN_FD", "1")) 缺少异常处理,非整数环境变量值会导致 ValueError 崩溃

总体评价

核心的 token_ids 对齐修复(截断至 len(keys) * block_size)逻辑清晰,方向正确;但移除 enable_output_caching 条件分支的意图需进一步确认,建议作者补充说明,避免引入潜在的 prefix cache 污染问题。

input_token_ids = token_ids + request.output_token_ids
else:
input_token_ids = token_ids
input_token_ids = token_ids + request.output_token_ids
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 移除 enable_output_caching 条件是否符合预期?

原代码在 enable_output_caching=False 时,input_token_ids 只包含 input tokens;现在无论该配置如何,总会追加 request.output_token_ids

虽然后续的截断 input_token_ids[: len(keys) * block_size] 可能会将 output tokens 剪掉,但这取决于 keys 覆盖的范围。如果 enable_output_caching=Falsekeys 的数量意外覆盖了 output token 区间,缓存内容将包含本不应写入的 output tokens,可能污染下次 prefix cache 命中结果。

请确认:当 enable_output_caching=False 时,keys 的长度是否严格不超过 input 对应的 block 数?如果可以保证,建议在此处添加注释说明该假设。

try:
self.config.namespace = os.getenv("AS_NAMESPACE", self.config.namespace)
self.config.pod_name = os.getenv("AS_POD_NAME", self.config.pod_name)
if int(os.getenv("ENABLE_EP_DP_IN_FD", "1")):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 int(os.getenv(...)) 缺少异常处理,健壮性不足。

当环境变量 ENABLE_EP_DP_IN_FD 被设置为非整数字符串(如 "true" / "yes" / "on")时,int() 会直接抛出 ValueError,导致初始化过程崩溃。

建议修改为更健壮的写法:

if os.getenv("ENABLE_EP_DP_IN_FD", "1") != "0":

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants